Minimum description length (MDL) regularization for online learning

نویسنده

  • Gil I. Shamir
چکیده

An approach inspired by the Minimum Description Length (MDL) principle is proposed for adaptively selecting features during online learning based on their usefulness in improving the objective. The approach eliminates noisy or useless features from the optimization process, leading to improved loss. Several algorithmic variations on the approach are presented. They are based on using a Bayesian mixture in each of the dimensions of the feature space. By utilizing the MDL principle, the mixture reduces the dimensionality of the feature space to its subspace with the lowest loss. Bounds on the loss, derived, show that the loss for that subspace is essentially achieved. The approach can be tuned for trading off between model size and the loss incurred. Empirical results on large scale real-world systems demonstrate how it improves such tradeoffs. Huge model size reductions can be achieved with no loss in performance relative to standard techniques, while moderate loss improvements (translating to large regret improvements) are achieved with moderate size reductions. The results also demonstrate that overfitting is eliminated by this approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimum Description Length Principle

The minimum description length (MDL) principle states that one should prefer the model that yields the shortest description of the data when the complexity of the model itself is also accounted for. MDL provides a versatile approach to statistical modeling. It is applicable to model selection and regularization. Modern versions of MDL lead to robust methods that are well suited for choosing an ...

متن کامل

MDL Regularizer: A New Regularizer based on the MDL Principle

This paper proposes a new regularization method based on the MDL (Minimum Description Length) principle. An adequate precision weight vector is trained by approximately truncating the maximum likelihood weight vector. The main advantage of the proposed regularizer over existing ones is that it automatically determines a regularization factor without assuming any specific prior distribution with...

متن کامل

23. Bayesian Ying Yang Learning (II): A New Mechanism for Model Selection and Regularization

Efforts toward a key challenge of statistical learning, namely making learning on a finite size of samples with model selection ability, have been discussed in two typical streams. Bayesian Ying Yang (BYY) harmony learning provides a promising tool for solving this key challenge, with new mechanisms for model selection and regularization. Moreover, not only the BYY harmony learning is further j...

متن کامل

Multi-regularization Parameters Estimation for Gaussian Mixture Classifier based on MDL Principle

Regularization is a solution to solve the problem of unstable estimation of covariance matrix with a small sample set in Gaussian classifier. And multi-regularization parameters estimation is more difficult than single parameter estimation. In this paper, KLIM_L covariance matrix estimation is derived theoretically based on MDL (minimum description length) principle for the small sample problem...

متن کامل

Paper Learning Bayesian Belief Networks Based on the Minimum Description Length Principle: Basic Properties

SUMMARY This paper addresses the problem of learning Bayesian belief networks (BBN) based on the minimum description length (MDL) principle. First, we give a formula of description length based on which the MDL-based procedure learns a BBN. Secondly, we point out that the diierence between the MDL-based and Cooper and Herskovits procedures is essentially in the priors rather than in the approac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015